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<^ Abstract 

(N 

Ads on the Internet are increasingly sold via ad exchanges such as RightMedia, AdECN 
^ and Doubleclick Ad Exchange. These exchanges allow real-time bidding, that is, each time the 

publisher contacts the exchange, the exchange "calls out" to solicit bids from ad networks. This 
aspect of soliciting bids introduces a novel aspect, in contrast to existing literature. 

This suggests developing a joint optimization framework which optimizes over the allocation 
and well as solicitation. We model this selective call out as an online recurrent Bayesian deci- 
i— —i sion framework with bandwidth type constraints. We obtain natural algorithms with bounded 

performance guarantees for several natural optimization criteria. We show that these results 
hold under different call out constraint models, and different arrival processes. Interestingly, the 
^ paper shows that under MHR assumptions, the expected revenue of generalized second price 

O auction with reserve is constant factor of the expected welfare. Also the analysis herein allow 

us prove adaptivity gap type results for the adwords problem. 

(N 
> 

1 Introduction 

A dominant form of advertising on the Internet involves display ads; these are images, videos and 
other ad forms that are shown on a web page when viewers navigate to it. Each such showing is 
called an impression. Increasingly, display ads are being sold through exchanges such as Right- 
Media, AdECN and DoubleClick Ad Exchange. On the arrival of an impression, the exchange 
solicits bids and runs an auction on that particular impression. This allows real time bidding where 
ad networks can determine their bids for each impression individually in real time (for an example, 
see |24J), and more importantly where the creative (advertisement) can be potentially produced 
on-the-fly to achieve better targeting |22j . 

This potential targeting comes hand in hand with several challenges. The Exchange and the 
networks face a mismatch in infrastructure and capacities and objectives. From an infrastructure 



*Part of this work was done while visiting Google Research. Department of Computer and Information Science. 
University of PennsylvaniaRhiladelphia, PA. Email tanmoy@cis.upenn.edu 

^ Google Research, 76 Ninth Ave, New York, NY. Email: evendar@google.com 

"''Part of this work was done while visiting Google Research. Department of Computer and Information Science. 
University of PennsylvaniaRhiladelphia, PA. Email sudipto@cis.upenn.edu 

§ Google Israel and The Blavatnik School of Computer Science, Tel- Aviv University, Tel- Aviv, Israel, Email 
mansour . yishay@gmail . com 

^Google Research, 76 Ninth Ave, New York, NY. Email: muthu@google.com 



1 



standpoint, the volume of impressions that come to the exchange is very large comparison to a 
smaller ad network limited in servers, bandwidths, geographic location preferences. This implies 
a bound on the number of auctions the network can participate in effectively. A network would 
prefer to be solicited only on impressions which are of interest to it, and in practice use a descriptive 
languages to specify features of impressions (say, only impressions from NY). However this is an 
offline feature and runs counter to the attractiveness of real time bidding. Therefore the exchange 
has to "call out" to the networks selectively, simultaneously trying to balance the objective of 
soliciting as many networks as possible and increasing total value, as well as not creating congestion 
or situations where solicitations are not answered. 

This leads to a host of interesting questions in developing a joint optimization framework that 
optimizes over the allocation objective as well as the decisions to solicit the bids. Specifically, a 
participant would be solicited for only a predetermined fraction of impressions. Moreover, these 
solicitations, referred to as "call outs" henceforth, need to be performed in a smooth manner and 
avoid burstiness. The impressions need to be managed in an online manner, which suggests the use 
of online algorithms. The burstiness properties suggests using a queueing model. And the overall 
goal is to optimize objectives such as (expected) welfare, revenue. While each of these issues have 
been considered in isolation, the overall challenge is to develop a joint framework, which in turn 
raises interesting questions about the interactions between different parts of the framework. 

The call out framework is formally modeled in Section [TTTJ The online allocation aspect, with a view 
that the call out constraints acts as budgets, is reminiscent of the online ad allocation framework 
for search ads, or the Adwords problem |19| [5j 17], and its stochastic variants HIT). However the 
call out framework is significantly different, which we discuss below. 

The Adwords problem is posed in the deterministic setting where the expected revenue is treated 
as a known deterministic reward of allocating an impression j to an advertiser i. The call out 
framework has no deterministic analogue; the rationale of the exchange is that the bids are not 
known. If the bids were known (or internal) then we would only call out the winners (assuming 
multiple slots) of the auction, which is the path taken by the Adwords problem. The call-out 
framework is similar to Bayesian mechanism design |21j . This has some fairly broad conceptual 
implications. 

First, is the notion of "adaptivity gap", where a policy is allowed to react to realization of the 
random variables. The analysis of adaptivity gap is the central question in the exchange setting. 
This is also relevant in the context of search ads and the adwords setting where the revenue is 
achieved on a click which is a random event. The adwords model uses the deterministic expectation 
but it is reasonable to allow an algorithm to adapt to this event (consider low click through rates 
and large bids, such that a payout affects the budget substantially). To the best of our knowledge, 
no analysis of adaptivity gap exists for the adwords problem but such a result will follow from 
our analysis. In the call out setting, when optimizing for welfare or total value, the assignment 
occurs after the bids are obtained, which has considerable gap in comparison to the assignment 
that assigns before obtaining the realizations (reduces to the expectations). 

Second, many objective functions such as generalized second price with reserve (henceforth GSP- 
Reserve), for one or multiple slots, have a very different behavior in the Bayesian and deterministic 
settings. The gap between assignment after and before the realizations is more stark in this context - 
consider running Myerson's (or similar) mechanism on the expected bids instead of the distributions. 
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Note that in GSP-Reserve we announce an uniform reserve price, before the bids are solicited as in 
|21j . For known deterministic bids, reserve prices can be made equal to the bid, and are not useful. 
Strong lower bounds hold for GSP without reserves pQ. 

Third, the notion of a comparison class in case of call out optimization framework requires more 
care. In the setting of these large exchanges, a comparison class with full foreknowledge of all 
information (in particular, the realization of the bids) is unrealistic. Moreover, the realizations of 
the bids depends on the networks which are called out, and two different strategies that call out to 
two different subsets will have completely different information. Thus to compare two algorithms, 
it appears that we should compare their expected outcome - but each algorithm is allowed to be 
adaptive. Thus a combination of stochastic and online models are in order in this setting. 

This combination of stochastic and online models relates the call out framework to the stochastic 
variants of the Adwords problem [HJEZ]- But while the similarity implies that Lagrangian decoupling 
techniques for separable convex optimization pioneered by Rockafellar [25J apply, the different 
possible objectives of the call-out framework are not convex. In fact, in the case of optimizing 
revenue in GSP (with reserve) or in posted price mechanisms, the objective is not submodular for 
all prices (as in welfare maximization). Submodular maximization with linear constraints has been 
studied, and while good approximation algorithms exist [SI HI], they are inherently offline - the key 
aspect of call out optimization is that the decision has to be made in an online fashion. The same 
is true for sequential posted price mechanisms analyzed in [U 0] (albeit with more general matroid 
setting), the posted prices in the call out setting need to be announced in parallel (and the eventual 
allocation is sequential) . Other than formulating the call out framework, significant contribution of 
this paper is to demonstrate that relaxations of natural objective functions can be made separable 
(as in [2S]), yet with bounded loss in performance ratio. Subsequent to the formulation, standard 
techniques of online stochastic optimization can be applied. 

1.1 Selective Call Out: The Model 

Let n be the number of ad networks 1,2. . . n. We assume that impressions arrive from a fixed 
(unknown) distribution over a finite set J7j, and that there exists a finite set of bid values Ub, 
where L = max{u| u £ Ub}- In the following, ad networks will be indexed by i £ {1,2. . . n}, 
impressions by j £ Ui and bid values by k G Ub- The problem setting involves several steps: 

1. An impression (or keyword) j, assumed drawn from a distribution T>, comes to the exchange. 
There may be multiple slots associated with a single impression, corresponds to text ads being 
blocked together, different locations in the page, which are often characterized by different 
discount rate. Let there be M slots, with discount rates 1 > q\ > Q2 ■ ■ ■ > Qm > 0. If a 
bidder bids v, then it is assumed that the bid for the I th slot is vgg. The case of M = 1 is 
common and correspond to a basic pay-per- impression mechanism with discount rate 1. All 
subsequent discussions apply to this common case as well. 

2. Given an impression j, the bid of ad network i for impression j is drawn from a fixed distribu- 
tion Vjj such that the bid is v with probability pij v . Note that the bids of different networks 
are likely to be correlated based on the perceived value of the impression, however, condi- 
tioned on j, the specific dynamics of different bidders can be construed to be independent. 
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We assume that the exchange has learned or can predict these pij v given the impression j. 
This is an assumption similar to estimation processes used by search engines to predict the 
value of different advertisements (albeit under static standing bids). 

3. The exchange decides on the subset Sj of networks to call out, subject to the Call-Out 
constraints, which roughly bounds the rate at which the exchange can send impressions to 
solicit bid from an ad network. This decision is executed before seeing the next impression. 

To define a specific problem in the above framework, we need to specify (i) an objective function 
(ii) a model for call out constraints, and (iii) the comparison class. The goal is to design a call-out 
policy that satisfies (ii) and is near optimal in the objective function (i), when compared to other 
algorithms in class (iii). We discuss the instantiations of (i),(ii), and (iii) in the following. 

(i) Objective Functions. We consider three different objective functions, (a) Total Value: The 
sum of the maximum bids in Sj over the arriving impressions j. (b) GSP-Reserve (defined above) 
and (c) Revenue under posted price mechanism (take-it- or-leave-it prices). All the quantities are 
in expectation. Total value corresponds to the welfare. The GSP with an uniform reserve price is 
a common mechanism used in these settings. Posted Prices (different networks may get different 
prices) are also used in this context. Note that the mechanism is parallel posted price because the 
prices are posted before any bids are obtained, and not sequential as in [8j 0]. 

(ii) Call Out Constraint Models. The simplest model for the call out constraints is a model 
where one impression arrives at the exchange at each time step and if the total number of arrivals 
is m, ad network i can be solicited at most times for some known pi < 1. We will refer to this as 
time average model under uniform arrival - which describes the constraints at the outgoing and 
incoming sides of the exchange respectively. Most of the paper will focus on this model - primarily 
because we can show that other common models reduce to this variant. The non-initiated reader 
can skip the description of these models and proceed to (iii). 

On the incoming side of the exchange, standard practice is to assume bursty (Poisson) arrivals. 
We consider this generalization. On the outgoing side, the simple model allow the possibility that 
the call-outs to a network are made on contiguous subset of impressions. This misses the original 
goal that the ad network would receive the impressions at a "smooth" rate. A common model used 
for behavior is the token bucket model |26j . A token bucket has two parameters, bucket size a and 
token generation rate p. The tokens represent sending rights, and the bucket size is the maximum 
number of tokes we can store. The tokens are generated at a rate of p per unit time, but the 
number of tokens never exceeds a. In order to send, one needs to use a token, and if there are no 
tokens, one can not send. The output stream of a (a, p) token bucket can be handled by a buffer of 
size a and a time average rate of p - the buffer is initially full. Unlimited buffer size corresponds 
to the time average model. 

(iii) Comparison Class. Given the call out constraints, we define the class of admissible policies. 

Definition 1 An admissible call out policy specifies (possibly with randomization), for each arriv- 
ing impression j, the subset Sj of ad networks to call out, while satisfying all call out constraints 
over the entire sequence of impressions. The policy bases its decision on the prior information about 
the bid distributions, and has no knowledge of the actual bid values. In the case of GSP-Reserve 
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mechanism, the call out policy also decides the reserve price for each impression. In the case of a 
posted price mechanism, the call out policy also decides the posted/take-it- or-leave-it prices. 

Our comparison class is the set of all admissible call out policies which know the bid distributions 
for every impression, but do not know the actual realization of the bids. The performance of a 
policy is measured as the expected (over the bid distributions and the impression arrivals) objective 
value obtained per arriving impression, when impressions are drawn from T>. 

1.2 Our Results, Roadmap and Other related Work 

We provide three algorithms LP-Val, LP-GSP, and LP-Post for the three objectives discussed 
for the time average uniform arrival model. We then prove that any approximation for any additive 
objective function translates naturally to an approximation for the other constraint models. Recall, 
L is the largest possible bid. A policy is a-approximate if it achieves at least a times the performance 
of the optimal policy for the corresponding objective. The algorithms will have a natural two- 
phase approach where we use the t initial impressions as a sample as exploration and subsequently 
use/exploit this algorithm (see Section [2] for more discussion). We show that: 

Theorem 1 Suppose the optimal policy has expected total value at least 5 > 0. For any e > 0, 
LP-Val with a sample of t = 0(JK^-) impressions gives a (1 — \ — e) -approximate policy. 

Theorem 2 Suppose the optimal GSF '-Reserve policy has expected revenue at least 5 > 0. For any 
e > 0, LP-GSP with a sample of t = 0{%^-) impressions gives a 0(1)- approximate policy, if all 
bid distributions satisfy the monotone hazard rate (MHR) property. Moreover, the call outs of the 
policy derived from LP-GSP are identical to those of the policy derived from LP-Val. 

In particular, we show that when every bidder is solicited - GSP-Reserve achieves a revenue that 
is 0(1) factor of optimal welfare, when all bid distributions satisfy the MHR property. This is a 
common distributional assumption in economic theory, and is satisfied by many distributions [3]. 
This result is in the same spirit as (but immediately incomparable to) the result in [5j , which relates 
the optimum sequential posted price revenue to the optimal welfare under the same assumptions. 
We are unaware of such results about GSP-Reserve. 

Theorem 3 Suppose the optimal posted price policy has expected revenue at least 5 > 0. For any 
e > 0, LP-Post with a sample of t = 0{%^) impressions gives a (1 — * — e)- approximate policy. 

We do not need the MHR assumption unlike the result in [4J for sequential posted prices, since the 
comparison classes are different (and the prices are posted in parallel). We next show that: 

Theorem 4 For a given distribution on impressions T>, suppose we have an a-approximate policy 
for an objective which is additive given the allocations and the realizations ( and is at least 5 > 0) in 
the time average uniform arrival call out model. Let o~{ > a Vi. Then we can convert the policy to 
a (a — r^j) -approximate policy in the token bucket model. The result extends to Poisson arrivals. 
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Roadmap: We summarize the results on online stochastic convex optimization in Section [2] We 
subsequently discuss the the total value problem in Section[3} We discuss the GSP-Reserve problem 
in Section [4j The posted price problem is discussed in Section [5j The token bucket model and 
other arrival assumptions are discussed in Section |6j To draw a contrast to the stochastic results, 
the adversarial order setting is discussed in Appendix |A"[ 

Other Related Work: A combination of stochastic and online components appear in many 
different settings |14] [T5l [2j [13] which are not immediately relevant to the call-out problem. We 
note that the bandwidth-like constraints (where the constraint is on a parameter different than 
the obtained value, as is the case for call-outs) has not studied in the bandit setting (see [TBI [23] ) 
because the horizon is constrained. Finally, bidding and inventory optimization problems studied 
in the context of ad exchanges |1 1 | HQj . [20], are not related to the call out optimization problems. 

2 Preliminaries 

Consider a maximizing a "separable" linear program (LP) C defined on Q global constraints with 
right hand side bi, such that the Lagrangian relaxation produced by the transferring these con- 
straints to the objective function decouples into a collection of independent non-negative smaller 
LPs Cj over n' variables and local constraints. This implies that the objective function of £ is a 
weighted linear combination of Cj. The uniqueness of the optimum solutions for Cj implies that 
C reduces to finding the Lagrangian multipliers. The unique solution is achieved by adding "small 
perturbations", see Rockafellar [53]. However, this approach only provides a certificate of optimal- 
ity and a solution, once we are given the Lagrangians. The approach does not give us an algorithm 
to find the Lagrangian multipliers themselves. 

Devanur and Hayes [S] showed that if the smaller LPs could be sampled with the same probability 
as their contribution to the objective of C, and the derivatives of the Lagrangian can be bounded, 
then the Lagrangians derived from a small number of samples (suitably scaled) can be used to 
solve the overall LP. The weighted sampling reduces to the prefix of the input if the CjS arrive in 
random order (see |12j). This was extended to convex programs in |27J. The number of sample 
bound requires several (easy) Lipschitz type properties: 

1. The optimum value of C is at least 5 > and the optimum solution of Cj is at most R. 

2. For each setting of the Lagrangians, every Cj has a unique optimum solution. 

3. Reducing bi by a factor of 1 — e does not reduce C by more than a factor of (1 — e). 

4. C does not change by more than a constant times 1 + e if we alter the value of the optimum 
Lagrangian multipliers by a factor of 1 + e. 

Theorem 5 Sample t = 0( n 9 R ) of the smaller linear programs and consider the linear program 
C which corresponds to the union of these smaller linear programs and suitably scaled global con- 
straints. If we use the optimum Lagrangian multipliers corresponding to the global constraints of £ 
to solve the decoupled instances of Cj as they are available (in an online fashion) then we produce 
a 1 + e approximation to the optimum solution of C. 
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In the setting of Adwords, the smaller LPs correspond to the arrival of an impression, and the 
associated assignment. Thus the stochastic framework of the Adwords problem is obviously of 
relevance to the call out optimization framework. However the focus shifts on solving the smaller 
LPs which encode the call out decision. A nice outcome of the approach is a simple two phase 
algorithm; an Exploration phase where the samples are drawn, and an Exploitation phase where 
the Lagrangian multipliers are used. If Q,n',R are small, then the exploration phase can be 
(relatively) short and this yields a natural algorithm. Thus the goal of the rest of the paper would 
be to formulate separable convex relaxations and achieve the mentioned properties. 



3 The Total Value Problem 

In this section, we prove Theorem [TJ and describe LP-Val. Let qj denote the probability that 
impression j arrives. We shall add infinitesimal random perturbations to pij v which shall not affect 
the performance of any policy but ensure pij v are in general positions, that is, any combination of 
them will almost surely create a non-singular matrix. 

The LP Relaxation: 

1. Let Xij be the (conditional) probability that advertiser i was called out on impression j. 

2. Let 

Vijvi be the probability that advertiser i bid the value t and was assigned the slot I (also 
conditioned on j). The constraints are named as A(x,y) and B(x,y) as shown. 



Ej <lr'-., < Pi 

LPl = max 2_^1j 2^ 1^2^ v SlVijvi s.t. x y < 1 

i v i i J^iVijvi <Pij V Xij } B{x,y) 

Xij , yijvi ^ 



A(x,y) 



Decoupling: Let A* be the optimum Lagrangian variable for the constraint ^2jQjXij < p%. LPl 
then decouples to smaller LPs, LP2(j,X*) subject to the constraints A(x,y). 



LP1=LP1{\*) = Y,\* Pi + Y, qj LP2(j,X[) where LP2(j, A?) = max ]T ]T £ v 0l y ijvt - £ A 



Solving LP2(j,Aj). We begin by considering the dual. Let Tji be the dual of the constraint 

YjiYjvVijvt < 1- Let £ij V correspond to the dual of the constraint Y^t yijvi — Pijv^ij' Let Qij 
correspond to the dual of x%j < 1 . 



Tji + Ujv > VQl 

DualLP2(j,X*) = min^2 Qj T je s.t. Qj ~ Y, v ZijvPijv > -A* 

' ^ Tj&i £>%jvi Cij — 
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Lemma 3.1 Let t* £ be the optimum dual variables for LP2. Then (i) For all £ there exists i and 
v > Tjg/ ge such that = vgg — t* £ . (ii) T* t j ge is non-increasing in £. 

Proof: For every £ there must be some i, v such that we have T* e + £*j V = vq£. Otherwise we can 
keep decreasing t* v keeping all other variables the same and contradict the optimality of the dual 
solution. Now £*j v > and the condition on t follows. The condition corresponds to the set of 
points (v,£*j v ) in the two dimensional (x,y) plane being above the lines {y = ggx — r* e }. 

For the second part, consider t* £ and the i, v such that we have t* £ + £*- v = vgg. Define t to be 
the support of £. Let v > T*Jgi be the largest such support of £. Consider r*^_ 1 - ) . We have 

( T j(e~i) + €ijv)/Qe-i - v = ( T je + £ijv)/ee- But Qt-i ^ Qi and tnus T *jilQl are non-increasing in £. 
Moreover, if ge = g^\ then r* e = T j^_i) (starting from the support of £ — 1). □ 

Decoupling LP2(j,A?) itself. Consider LP2(j,X*) with the Lagrangians T* t The problem de- 
composes under the constraints B(x,y), to 

LP2(j,X*) = Y, T ; t + Y,LP3(j,%,T* t ,i) where LP3(j, A* , rj e , i) — max ( ^ [vg e - r* e ~\ y lJv i - X*Xij 

I! A \ I) 



Lemma 3.2 Define £{v) = argmax^' jf^'f — T* e ,\g£/v > r*p j and £(t) = M + 1 if the set is empty. 

Set Vijvt = Pijv tft = K v ) and otherwise. IfJ2 v J2e v Qiy*j v e ^ K we set x v = 1 and Vijvl = VijvV 
Otherwise we set Xij = yij v i = 0. 

Proof: LP3(j, A* ,T*£,i) is optimized at Xij = 1 or Xij = 0. This is because if < Xij < 1 and 

Y2e^2v V S(- ~ T ji^ Vijvi ~ K x ij > then we can multiply all the variables by 1/xij and have a 
better solution. If the latter condition is not true then x^ = is an equivalent solution. If x^j = 1 
the optimal setting for yij V £ is y*j v£ - (Note that y*j V g is uniquely determined for a fixed t.) Thus 
the overall optimization follows from comparing the x^ = 1 and x^ = case. □ 

Interpretation and the Call Out Algorithm: Given {t* 1 } i ^ =1 , the distribution {pij v } for i, is 
divided into at most M + 1 pieces (some of the pieces can be a single point) given by the upper 
envelope (the constraint max) of the lines {gex — Tjg}fL 1 and the line y = 0, in the x-y coordinate 
plane. Intuitively, seeing the value x = t, if the upper envelope corresponds to the equation gex — T* e 
then we are "interested" in the slot £. If the weighted (by gg) sum of interests, given by J2 V v SeU*j V £ 
exceeds A*, then it is beneficial to call out i. We call out based on this condition and allocate the 
slots in decreasing order of bids. 

Analysis: The LP2 solution satisfies: LP\ = J2i v v Pijv Qe(v) an d Yliv-e(v)=£Pijv = 1- For each 
slot £, let Wi(£) = Y^vi(v)=e v Pijv anci u i(£) = 12v.e(v)=ePijv Order the i in non-increasing order of 
Wi(£) / Ui(£) inside the slot. If we call out to i and get t, then for the sake of analysis we will consider 
its contribution to slot £(t) only. Moreover, we stop the contribution to a slot £ if any any of the i 
return a value t with £(t) = £. The best M ordered bids outperform the analyzed contribution in 
every scenario. Therefore it suffices to bound the contribution of the analysis. 
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Lemma 3.3 Suppose we are given a set of independent variables Yi such that Pr\Yi ^ 0] = Uj and 
E\Yi] = Wi. Consider the random variable Y corresponding to the process which orders the variables 
{Yi} in non-increasing order of Wi/ui, and stops as soon as the first non-zero value is seen. Then 

E[Y] = E< - > Ei m (1 - e" 1 ) • 

Proof: Let F({(wi,Ui)}) = EiII»'<i(l ~~ iH') w i' Let A = Ei^i/Ei^- Given the sequence 
{(iOj,iti)} where Wi/v,i are non-increasing, if there exists an i such that Wi/ui ^ Wi+i/ui+i, then 
define a new sequence {(w^Ui)} as follows: 

!t«j/ if i' ^ i, i + 1 

— A if i' = i where 
+ A if i' = i + 1 

Note that EjU>j = E« w i an d w'Jiii remains non-increasing. Now F({(wi,Ui)}) — F({(u;-, -Uj)}) = 
Y\ i , <i (l — Ui')A — Yl i r <i+ i(l — Uii)A = Yli'<i0- ~ Ui')uiA > 0. Thus, we can repeatedly perform the 
above steps till we get a sequence such that w[/ui remains the same for all i and Ei^i = Ei^i- 
Clearly w\ = Xui in this case. The function F will continue to decrease, and 



1 



> F({(wi, Ui )}) = J2 II^ 1 - u '')A^ = A 1 - J](l - 



But -(1 — e x ) is decreasing over [0, 1] and the worst case is x = 1. □ 

In slot £ (renumbering the advertisers in the order of Wi(£) / Ui(£)) we get an expected reward of 
Wi(£) if we reach i. But the events are independent in a particular slot. Thus the expected reward 
in a slot is bounded by (using independence and Claim 3.3) to be (1 — e _1 ) times Ej^iCO- We 
now apply linearity of expectation across the slots - observe that the events across the slots are 
quite correlated. The expected reward is at least (1 — e -1 ) times E^Ei^W = LP1. Theorem [l] 
follows from Lemmas 3.2 and 3.3 and the application of Theorem [5] 



4 Generalized Second Price with reserve (GSP-Reserve) 

The call outs for this problem would be exactly the same as the algorithm in Section [3j We will 
however adjust the reserve prices. The reserve price will be the same for all the advertisers being 
called out on that impression. In fact either we will run a single slot auction with a reserve price, 
or simply GSP for the M slots. The decision will depend on the LP solution found for this specific 
impression (and the contributions of different parts of the LP). Recall that the bid distribution Vjj 
of advertiser % on impression j is assumed to satisfy the MHR property. We use the following: 

Lemma 4.1 (Lemma 3.3 in \N) For any random variable V following an MHR distribution, let 
v* = argmrm,{?;|?;Pr[y > v] > \ £ „,>„ v' Pr[V = v']}. Then Pi[V > v*] > e~ 2 . 



The next lemma is a restatement of Lemma 3.2 and the subsequent analysis. 
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Lemma 4.2 Given an impression j{t), and define v\{t) = max^ :£ , 1 ^^(rT 1 — t*^)/{q\ — gg). The call 
out to a set S(t) 7^ 0, ensures that Ei Ei>>ra(t) Pijv > 1- 

Definition 2 Given an impression j(t), and the call out decision to a set S(t) 7^ at time t, let 
<j(M) = mm{v\2vPi[Vij > v] > Y^ v '>v v 'Pijv'} an d = OH G S ( t ) and - ^ofe 



i/iai using Lemmas 4-L and 4-2, we have \^{t)\ < |_ e J = 7 since i 6 ^(i) contributes a probability 
mass of at least e . 

Lemma 4.3 Given an impression j (t) , and the call out decision to a set S{t) 7^ at time t, we can 
set a single threshold v*(t) > v\(t) such that if we set a reserve price v*(t) for a single slot then the 
revenue (ignoring the multiplicative discount factor q\) is at least 4 ( 7 J> +1 ) EieS(t) E«>Bi(t) vpijv- 

Proof: Let EieS(t) E«>«i(t) v Pijv = Z - We have two cases ' (*) Ei 6 ¥(t) E„>„i(t) ^ 7e2 ^/ ( 7e2 + 
1) or (ii) otherwise. In case (i), pick the i € ^(t) such that Eu^ift) i s maximized, which is 

at least e 2 Z/(7e 2 + 1) since |^(i)| < 7. Let be the random variable that corresponds to the bid 
of advertiser i on impression j. Now since v^i, t) > vi(t) we have that 



^ up^ > Pr[Vy > u^(i,*)|Vy > vi(t)] Y mjv>Pr[Vij >v^(i,t)] ^ v Pijv < ^ Y vp ijv 

which is at least Zj (7e 2 +l). Now, if we set v*(t) = t) then just from i we have EieS(t) Ylv>v*(t) Pijv — 
I Eu>i>* (j t) an d therefore in this case the lemma is true. 

In case (ii), we have EieS(t)\tf(t) E«>«i(t) > 2"/(7e 2 + 1). But for each i G \ we 
have uiCtJEw^^Pii* > ^ ^^(t) and as a consequence, ^(t) Ei 6 S(t)\*(t) is 

at least Z/(2(7e 2 + 1)). Consider setting v*(t) = v 1 (t). Let p = EieS(t)\*(t) E«>« x (i) Pijv- Since 



p < 1 (from definition of vi(t), see Lemma 4.2) the probability of sale is at least (1 — -)p which is 



bounded below by p/2. The Lemma follows in this case as well. □ 

Lemma 4.4 Given an impression j(t), and the call out decision to a set S(t) 7^ at time t, 
consider (i) If ' Y.ieS(t)T. v VQi(v)y* ijvi{v) > ^J2ieS(t)J2 v >v 1 (t) v ^y*jvi then call-out to S(t) and run 
regular GSP. (ii) Otherwise call-out to S(t) and run a single slot auction with the threshold v*(t) 



given by Lemma 4-3 ■ This algorithm gives a revenue which is S7(l) factor of the LP bound on 
efficiency which is given by E^ Eies(t) Ej; v Qey*j V £( v ) ■ Note that the call out decisions are based on 
optimizing the total value/ efficiency of the slots, and thus are feasible. 

Proof: Let the non-increasing ordered list of values that are returned for a time step be a\(t). 
Suppose we are in case (i). Then the revenue of GSP is at least Er=l Qr a r+i(t)- Now since g r are 
decreasing, and a r {t) are non- increasing, 



M M 




" M 




" M 




) j Q r a r+ i(t) > ) j g r a r (t) - giOift) 

r—1 r—1 


=*> E 


} J g r a r +i(t) 

_r=l 


> E 


} J g r a r (t) 
.r=l 


-eiE[oi(t)] (l) 
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We know from the rounding in Section 3 that E 



E P =1 QrOr(t)\ > (l - \) EieS(t) T, v v eey* ijveiv y 

We observe that E [oi(i)] < Eies(t) Eo>«i(t) v Vijvl- This is easily seen if we write an LP for the 
maximum value seen (this LP is for analysis only). Let Xi v be the probability that i £ S(t) is the 
maximum with value v. Then (we drop the index j for convenience): 



> x , Ei El! x iv ^ 1 

E[ffli(t)] < LP M AX = max 2_]z2 VXiv s '*' x iv < p iv 

i v Xi v > 

The optimum solution of LP MAX is x* v = pi v for v > r and x*^ < pj„ for one i and v = t. Here r 
is the optimum dual variable for the constraint EieS(t) Eu x ™ — 1- Note that EieS(t) Eu x ii> = 1- 
For u > we have y* jvl = p ijv and v < v^t) we have ^ = 0. Moreover E» 6 5M E«> vi (t) = 
1. Likewise for u > r we have x* v = pi v and Krwe have x* v = and EieS(t) E«>t x iv = 1- 

Suppose that r < Wl (t). We arrive at a contradiction because EisS(t) E^> T <t> > EieS(t) E«> Vl (t) 
1 which implies that we are exceeding the probability mass of 1 for the maximum. On the other hand 
if r > vi(t), then we again have a contradiction that EieS(t) Y,v> Vl (t) > E< 6 s(t) E„>r = 1 
which implies were not feasible. 

As a consequence, r = v±(t) and for u > r = t>i(i) we have x* v = y*j vX = Pij v = Piv For 
v <T = v x {t) we have x* v = y* jvl = 0. Therefore E[ai(t)] < LPMAX = EieS(t) E,^*, = 
EieS(t) Ed v Vijvi as claimed. Applying this claim to Equationjlj and the fact that EisS(t) E« v Vijvi - 
I Eie5(t) £« vg e y* jve(v) , we get 



E 



M 



2J ^rflr+l(*) 



1 1 



ies(t) o 



Thus in this case the expected revenue is 0(1) of the LP bound on the efficiency 

Suppose we are in case (ii). By Lemma [43] we are guaranteed an expected revenue of 0(1) times 



ei Y Y v phv> Y Y v eivijvi>z Y Y ve ^)y*nvt(v) 

ies(t) i!>Di(t) tes(t) ti>Di(t) ies(t) » 

In this case also the expected revenue is 0(1) of the LP bound; the lemma follows. □ 
We are ready to prove Theorem [2} 

Proof: (Of Theorem [2]). Let App be the policy that approximately maximizes the efficiency. Let 
OPTc be the optimum GSP-Reserve policy. Let OPT# be the optimum policy which maximizes 
the total value. 

Given a policy P let Gsp(P) denote the expected revenue of the policy if the charged as GSP, 
BestKW(P) denote the expected (weighted) efficiency. Then for any policy Gsp(P) < BestKW(P). 



Let i?(App) be the revenue of the policy in Lemma 4.4 Therefore, for some absolute constant a > 1 



Gsp(OPT g ) < BestKW(OPT g ) < BestKW(OPT b ) < LP1 < a P(App) 
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The theorem follows (again appealing to Theorem [5]) . □ 

5 The Posted Price Revenue Problem 

In this section we prove Theorem [3j The flow of ideas would be similar to that of Section [3j however 
we would require a different LP. 

1. Let Xij v denote the probability that advertiser i was called out on impression j with price t. 

2. Let 

Vijvi be the probability that advertiser i was offered price t and assigned to slot £. The 
revenue generated in this event is Q(t. 

Let pij v = J2t'>tPijv denote the probability that on impression j, advertiser i has a valuation at 
least t. The LP is: 



E j 1] Eu X ijv — Pi 

LP4 = max > y gj > y > y > y VQtyijvi s.t. J2 v x ijv ^ 1 

J -"it HiVijvi <Pijv%ijv } B(x,y) 

•Eijvi Vijvi. 1^ 



A(x,y) 



Decoupling: Let A* be the optimum dual variable for the constraint Ej Qj Ed x ijv < Pi- The 
result of the decoupling is (subject to A(x,y)): 



LP4 = LPA{\*) = ]T X* Pi + ]T qjLP5(j, A*) where LP5(j, A?) = max f £ £ £ v Qi y ijvi - E E A * 

i j \ v i i iv 

Solving LP5(j, A?). Let t* £ be the dual of the constraint Ei Ei> Vijvi < 1- And again we have a 
decoupling (subject to B(x,y)): 

LP KfK) =^2 T j£ + ^2 LP6 USh T % i ) where LP §{hK,T* v i) = max^2^2(tQt - T* e )y ijve - X*x ijv 

l i v I 

The next lemma follows from inspection (assuming xu v being fixed). 

Lemma 5.1 In the optimum solution to LP6(j, A*, t* £ , i) we must have YliVijvi = VijvX%j v and 
moreover we should have 

_ f PijvXijv if Z = argmax£/{vft> - T* e \tg e > r*J 
Vi i vl ~ I otherwise 
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Definition 3 Note that this fixes a slot function fij(v) = argmax^/j?;^ — T%\vgg > r*^} for each 
t. Again note that this corresponds to the upper envelope of the lines y = xge — t*^ and y = 0. Thus 
fij(y) can be represented as a piecewise linear function of v. We can view this as % is offered price 
v with the understanding that if i accepts then i we will attempt to give the slot fijiv) to i". Note 
slot M + 1 corresponds to qm+i = 0- 



Therefore we have: LP6(j, X*,t* £ , i) = LP7(j, X*,T* e , i) where 
LP7{j, A*,t* £ , i) = max^ ~ r //y( ))P«« - A* 



%ijv S.t. 



v 



Lemma 5.2 In the optimum solution to LP7(j,X*,T%,i) we have 

x * t = f 1 if v ' = arg max,, - rj). . (o) | {tg f .. (v) - t* } .. {v) )p ijv > A* } 
tjv ' \ otherwise 

Note that we can compute v' easily because fij(v) can be represented as a relatively simple function 
of v, based on Definition^ 

Thus we are offering an unique price v' = arg max,{(tJ Qf^v) ~ ^fi-^PijA^Qfaiv) ~ T jfi (v))P^ > 
A*} to the advertiser i (with the intuitive idea that the advertiser would be considered for slot 
fijiv')). We denote v' = oo if the condition does not hold for any value of t possible for i. The 
probability that the advertiser accepts is and this is also y*j v i£> where £' = fijiv'). Therefore 
given j, to advertiser i we offer a price of v'(i,j) which is a function i,j with a slot £(i,j) = 
fij(v'ihj)) in mind. 

The Interpretation and the Rounding: We note that for every slot i we have X^-£=£(i j) Vijv'i — 
1. And the expected reward is Yli-i=i(i,j) Q$ '(hj)Vijv'£- 

For each slot we order the advertiser in non-increasing order of v'(i,j), and we can perform the 



same analysis as Lemma 3.3. And for each slot we expect a revenue of (1 — M fraction. Since 
we have one slot in mind for each advertiser we can sum up across the slots and expect a (1 — -) 
fraction of the LP revenue. Note: that in a given scenario, after all the bids are accepted, we can 
make the best allocations and only increase the revenue in the process. Thus Theorem [3] follows. 



6 Handling Bursts: Token Buckets and Poisson Arrivals 

We shall now present the proof of Theorem [4} Recall that the token bucket model starts with a 
full buffer - and thus unlimited buffer size corresponds to the time average model. 

The Uniform Arrival case: We first consider the uniform arrival case, where an impression 

arrives every unit time step. Consider any algorithm A which makes call outs according to the 
time average model. We define A' to behave exactly as A except that in the case the token bucket 
is empty for advertiser i, no call out is made to i in A' . 
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Let the expected value received by A in sending impressions to ad network i be Ri , so that the total 
expected value obtained by our algorithm is R = Y17=i Let ^ ne corresponding values obtained 
by our algorithm in the token bucket model be R\ and R' respectively, and so R' = Y17=i ^ e 
assume that in the beginning, each token bucket is full of tokens. This is safe to assume when the 
process runs for sufficiently long time T >> (Ji/pi, since one can simply ignore the impressions 
arriving in the first Oil Pi units of time, and lose only a small fraction of the objective value in 
expectation. 

Lemma 6.1 R\ > (1 - ^i)Ri- 

Proof: Let us focus on a single token bucket, that of ad network i. We simply need to calculate the 
expected fraction of times that an attempted sending by our algorithm to ad network i succeeds. 
We shall show that this fraction is at least (1 — - . 1 L 1 ). Given a set of arriving impressions, every 
order of their arrival is equally likely, so every impression is equally likely to have a failed attempt 
at sending, and so the value obtained is at least (1 — -^—^)Ri. 

For every arriving impression, the algorithm decides whether to attempt sending it to ad network 
i. We lower bound the number of successful sending by the following process: send whenever 
the algorithm attempts sending till the token bucket is empty, then neglect all impressions till 
the bucket becomes full (dormant period), then again send whenever the algorithm attempts to 
do so, and so on. This can be observed to be a lower bound because for every impression that 
the algorithm attempts but fails to send while the said process succeeds, there must be a unique 
previous successful sending by the algorithm during the dormant period of the process. 

At time step t, let Xt denote the amount of tokens in the bucket. For all t, < X± < Oi, and 
Xq = Oi. Let r be a stopping time, defined as the first time when X t < 1. It is easy to see that 
E [r] < oo, since there is a non-zero probability of a burst of impressions that are sent to ad network 
i. Let Z t = Oi — Xt, so Zq = 0. Note that o- L > Z T > Oi — \. Let Y t be the number of impressions 
sent to ad network i before or at time step t. Also, at every time step, let ccj < pi denote the 
probability that the algorithm sends an arriving impression to ad network i, i.e., = J2jQj x i,j- 
We prove the following subclaim: 

Claim 1 E [Y T ] > {oi - l) 2 <VPi- 

Proof: First, we observe that Yt — Yt-\ is 1 with probability aij, and zero otherwise. So (Yt — atf) 
is a martingale. Since E [r] < oo, so applying Doob's optional stopping theorem on the martingale 
, we get that E [Y T — c^t] = 0, so E [Y t ] = c^E [r]. We shall now show that E [r] > (oi — l) 2 / pi- 
Let F t , t > denote the filtration for the sequence {Zt}, that is, ¥ t is the information about all the 
values Zq, Z\ . . . Z t . We now identify a process A t such that Zf — A t is a martingale, and Aq = 0. 
Clearly, such a process exists (by Doob decomposition), and A t +i = E [Z t 2 +1 |F t ] — Zf. At each 
time step, Z t decreases by pi (lower bounded by zero), and increases by 1 (upper bounded by Oi) 
with probability oti. Thus, when Oi — 1 > Zt > pi, 
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E [Z t 2 +1 |F t ] = (1 - ai){Z t - Pi ) 2 + a t {Z t + l- Pl f 
= {l-ai){Z 2 t -2Z m + P 1)+ 

a i (Z?+2Z t (l-p i ) + (l-p i ) 2 ) 
= Z 2 t + 2(a, - Pi )Z t + oti(l - Pl f + (1 - ai ) P 2 
< Z\ + P 2 + a, ; (l - 2 Pi ) < Zf + Pl 

The inequalities follow from the facts that on < pi < 1 and Z% > 0. Moreover, when pi > Zt > 0, 
Zt+i is zero with probability (1 — Oj), and at most 1 — pi + Zt < 1 with probability Qfj. Thus 
E [Z t 2 +1 |IFt] < «i < jOj. This implies that At+i — At < pi and so A t < p t , for all t < r. 

Now we apply Doob's optional stopping theorem to obtain that E [Z 2 — A T ] = 0. Since Z T > dj — 1, 
we have (<7j — l) 2 — p^E [r] < 0. This proves the claim. □ 

The process will succeed in all attempts to send impressions to ad network i till time r. After 
this, the process is dormant and neglects all impressions for the next Z T /pi steps, so that the 
bucket becomes full, and then the same argument can be repeated. During the dormant period, 
the expected number of attempted sending by the original algorithm is at most Z T oei/pi. So the 
fraction of sending attempts that are successful is at least 



> 1 



{cj-l) 2 ai , meg 
Pi ~ Pi 



This completes the proof of Lemma |6.1 
Lemma 



6.1 



implies that if <7j > a Vi, then R' > (1 — — zr)-R. 



Poisson Arrival Process. Now we consider the token bucket model under the assumption that 
the time difference between the arrival of two consecutive impressions is drawn from an exponential 
distribution (with mean 1, for normalization). This implies that the arrival process is a unit rate 
Poisson process. The tokens fill up continuously, at a rate of pi tokens per unit time. Extending 



our analysis from discrete to continuous martingales, in proof of Lemma 6.1 we get, 
Lemma 6.2 > (1 — in the Poisson arrival process as well. 

Proof: Xt, Yt and Zt = Oi — Xt are now continuous processes. The Doob optional stopping theorem 
holds for continuous martingales, and we can find a process At such that Z 2 — At is a continuous 
martingale. Let r be again the smallest t such that Xt < 1. It again boils down to showing that 
E [r] > (<7j — l) 2 / pi. To show this, we need to show that At < pi for all t < r. Between time t and 
t + di, where dt is an infinitesimal increment, there is a probability dt that an impression arrives 
(since the arrival time is a Poisson process), and thus probability a^dt that the algorithm attempts 
to send an impression to ad network i, while pidt tokens are generated. Thus, when < Zt < o~i — l, 
we have (neglecting higher powers of dt): 

E [Z t 2 +dt |F t ] = (1 - ai dt)(Z t - Pl dtf + ai dt{Zt + 1 - p l dt) 2 
= Z 2 + otidt — 2Z t ( P i — ai)dt < Zf + P idt 
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So we infer that A t+ & t — A t < Pidt, and so A t < pit Vt. Finally, applying Doob's optional stopping 
theorem for continuous martingales, we get that E [Z 2 — A T ~\ = 0, which implies that (<7j — l) 2 — 
Pi B [r] < 0. □ 

This completes the proof of Theorem [4j 



7 Experimental Study 

In this section we experimentally explore aspects of our model and performance of algorithms, 
focusing solely on sales as our measure of performance in our simulations, and sticking to the basic 
mechanism where ad networks report their bids. In each online query, there is a single impression 
with discount factor 1, and a minimum price set by the publisher. The impression is considered 
to be sold if at least one of the ad networks that are called out returns a bid higher than this 
minimum price. The objective is to maximize the number of impressions sold. Sales can be viewed 
as a special case of the efficiency problem, where all bids are or 1 (they are willing to buy the 
item at a minimum price quoted by the publisher, or not), and so the corresponding algorithm is 
applicable. The following questions are important. 

1. How much does estimating the bid distributions help? We assumed that the Exchange will 
estimate the survival probability PijkS for ad network i, impression j and bid k (that is, 
probability that the bid is at least k), via machine learning and data mining techniques. 
Can we require less out of these techniques? Motivated by sponsored search systems, it is 
tempting to only estimate the expected bid for each rather than the entire distribution 
of Pijk- Further, for the sales metric, it suffices to only estimate the probability of a bid 
above k for while for the value metric, we use the entire distribution of Pijk- We compare 
performance of algorithms that use different amounts of information about PijkS. 

2. How does the error in estimation affect the performance of algorithms? Methods that estimate 
Pijk will have errors. We study the influence of such errors on our algorithms. 

3. What is the benefit of the optimization over simple, natural schemes? We evaluate our LP- 
based solution and compare it to other simpler schemes. 

Before proceeding to the simulation setup, we list the algorithms we consider. The linear program- 
ming based algorithm uses probability thresholds to decide on call outs. However alternative (and 
simpler) implementations typically would choose subsets based on some criterion. Thus we have a 
natural partition of set and threshold based algorithms. 



Set Based Algorithms: All our candidate algorithms sort the ad networks according to some 
criterion, which may depend on the arriving impression as well as history of the process. The 
algorithms then attempt to call out to the top ad networks in this ordering, which succeeds if 
those ad networks have bandwidth remaining. It remains to specify how many ad networks are 
picked for a particular impression. Set based algorithms are simply those that pick the top k ad 
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networks for every impression. The first two algorithms are baseline algorithms that are motivated 
by scheduling algorithms, but they ignore the probability estimations. The next two algorithms use 
the probability estimation to guide the call outs (and are based on different but widely occurring 
ways of characterizing high probability subsets). 

1. Random: chooses a random ordering. 

2. MaxRemBand: orders by (decreasing) remaining bandwidth. 

3. MaxProb: orders by (decreasing) survival probability. 

4. MaxExp: orders by (decreasing) expectation of a bid. While expected bid value can be 
clearly shown to be inadequate on specially designed examples, we evaluate the value of this 
information on our more generally generated data sets. 

Threshold Algorithms: The above algorithms all have a parameter k, the size of the set of ad 
networks to which call out is attempted for each impression (subject to bandwidth availability). 
Each of the algorithms in this class induce an ordering among ad networks and choose a prefix of 
this ordering such that the survival probabilities ^2iPiXi of the chosen ad networks add up to a 
threshold. If the sum exceeds the threshold, then the algorithm probabilistically decides whether to 
attempt call out to the last ad network in the chosen ordering. The probability of this event is set 
such that the sum of survival probabilities times the probability of getting called out sums up to 
exactly the threshold. Note that our LP-based algorithm is close to a threshold algorithm with the 
threshold set at 1, except that the Lagrangians z* that were learned also acts as a cut-off. We found 
this algorithm to be conservative in spending the call outs and decided to set this to a variable 
threshold. The class of threshold algorithms is justified by the Lemma |A.1| Below are all the 
threshold algorithms that we consider (again, we only make the call outs if the actual bandwidth 
is available). 

1. Th- Random: chooses a random ordering. 

2. Th-MaxRemBand: orders on maximum remaining bandwidth. 

3. Th-Prob: orders on (decreasing) survival probability. 

4. Th-LP: This is the LP based algorithm discussed above. 



7.1 The Simulation Setup 

We simulate the above algorithms using synthetic data generated from specific natural distributions, 
and vary parameters in the data generation such as impression arrival rate and range of minimum 
prices of impressions. 

Implementation: There are 32 ad networks, and the bandwidth of each ad network is implemented 
as a token bucket or a buffer. Tokens get generated in a bucket at a uniform rate, which reflects the 
bandwidth, and an attempt to call out to an ad network succeeds if and only if there is at least one 
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Figure 1: Set based and Threshold algorithms for Gaussian bid distributions 
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Figure 2: Set-based and Threshold algorithms for Pareto bid distributions 

token, which is consumed by the call out, in the bucket corresponding to the ad network. Moreover, 
a token bucket has a limit on the number of tokens it can store, and if the bucket is full, tokens 
generated at that time are lost. This reflects the burst-size allowed in the communication. Unless 



specified otherwise we use a bucket size of 5 for most of our simulation results (see section 7.4). 
The ad networks have (continuous) token generation rates chosen uniformly at random between 5 
and 50 per unit time. Impressions arrive according to a Poisson clock with a fixed rate. This rate 
is the expected time lapse between two consecutive impressions. Again, unless specified otherwise, 
we use a rate of 0.003 for the Poisson clock. The average token generation rate per impression, p,, 
varies from 0.015 to 0.15 when the rate of the Poisson clock is 0.003. 

Bid distribution: All bids are drawn from bounded distributions, which put all its probability 
mass in the range [0, R] for a fixed positive value R. Gaussian is a very commonly seen distribution, 
with exponential decay, while Pareto is a heavy-tailed distribution which has polynomial decay 
(power law), and is also often observed in online ad scenarios. The means of these Gaussian 
or Pareto distributions are chosen uniformly at random from the range [0,0.5i?]. The Gaussian 
distributions are then given a standard deviation uniformly between and 0.5 times the mean, 
while the degree of the polynomial pdf of Pareto distributions is uniformly chosen between 2 and 
5. Subsequent to their choice, all distributions were truncated to [0,R] - note that the truncation 
affects the expectation and survival probabilities. 

Verticals and Minimum Prices: Different types of impressions should have different value to 
an ad network, so we have 10 different types of impressions, which we refer to as verticals. Each 
arriving impression is assigned a vertical uniformly at random. The bid distribution of each ad 
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Figure 3: Set-based algorithms for bid distributions with higher minimum prices 



network for an impressions depends on its vertical only. However, the survival probabilities can 
still vary for impressions from the same vertical, because their minimum prices are independently 
chosen. Thus, impressions are iid drawn from a fixed distribution. 2000 impressions arrive in 
each simulation, while our LP-based algorithm is given 500 impressions to learn from the same 
distribution of impressions. 

The minimum prices of each impression come from the range [0.2R, R] for most of the simulations, 
but we also verified our broad findings when the range of minimum prices is [0.5R,R]. Higher 
minimum prices reflect lower levels of maximum possible sales. 

For our set-based algorithms, we try out all powers of 2 for the value of k, that is, 1, 2, 4, 8, 16 and 
32. For our threshold algorithms, we try 0.5, 1.0, 1.5, 2.0, 2.5 and 3.0 as our threshold values. This 
completes the description of the simulation set up, and we are now ready to describe our findings. 

7.2 Performance of various Strategies 

We tried our algorithms for 2000 impressions in each simulation. While this stream may seem 
small, we checked the standard deviation of our results when different streams drawn from the 
same distribution were used, and found them to be sufficiently small. With 10 different such 
streams, the standard deviation was found to be about 2% or less for all the algorithms. The 
performance (along with error bars indicating the deviation) of the various strategies are plotted 
in Figures [TJ [2] and [3j While the performance of threshold algorithms are plotted against the 
threshold parameter, that of the set based algorithms are plotted against /c on a logarithmic scale, 
that is, log A;. BEST-LP indicates the performance of our LP-based algorithm for optimal choice 
of threshold, and is shown along with set based algorithms for comparison. OPT-UB indicates the 
upper bound on an optimal policy that is given by LP1 in the previous section. Note that the real 
offline optimum can be significantly smaller, but is hard to compute. The following conclusions 
emerge: 

1. The LP-based algorithm surpasses the performance of the other algorithms by a large margin, 
about 20% more than the closest performer with optimal parameter choice. This gap grows 
to 85% when the reserve prices are raised, and the performance of all the algorithms (and also 
the optimum) falls. Thus, our LP-based algorithm is even more valuable in a world where 
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Figure 4: Set-based algorithms for different bid distributions with erroneous bid estimations 

fewer items can be sold. Figure |3j show the effect of raising the minimum prices for set based 
algorithms. 

2. The algorithms Random and MaxRemBand perform quite poorly compared to all other al- 
gorithms that use information about bid distributions. This shows that bid estimation is 
useful. 

3. The performance of MaxExp was better than Random and MaxRemBand, but was worse than 
MaxProb. Thus for the distribution used in the simulation, information about expectation 
was useful, but less useful than the survival probability of the bids. 

4. The other threshold based algorithms perform quite poorly. In the rest of the experimental 
evaluation, we will ignore these algorithms. 



7.3 Robustness and Error estimation 

Estimating survival probabilities can be a tough problem, and can be loosely compared to the 
problem of estimating click-through-rates. It is thus very likely that the estimates will be inaccurate, 
and it is very important to understand how useful such erroneous estimates are. We add noises 
to the survival probabilities that are normally distributed (round to or 1 if the resulting value is 
below or above 1, respectively), and have standard deviations ranging from 0.05 to 0.15. We find 
that the algorithms using this information lose some performance, but still perform consistently 
better than the algorithms without any information about bid distributions. We show the plots for 
both distributions, in Figure |4j For comparison, we choose to represent an algorithm by its peak 
performance, that is, its performance for optimal choice of parameters. 



7.4 Sensitivity analysis 

For all the plots described above, the size of the token buckets were fixed at 5 and the arrival rate 
was fixed to 0.003. We tried out bucket sizes of 2, 5, 15, and 45. As expected, the performance 
of each algorithm is greater when bucket size is larger; but there was no substantial differences in 
the conclusions we observed. In fact, the performance of the algorithms were within a standard 
deviation of a different bucket size (we omit showing them to avoid clutter) There was no difference 
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Figure 5: Set based algorithms for Pareto bid distributions with burst sizes 2, 5 and 15 

in performance between bucket sizes 15 and 45. We superimpose the performance plots of bucket 
sizes 2,5,15 and show this plot for Pareto distributions in Figure [5} We also varied impression 
arrival rate to note its effect on our algorithms. We found, as expected, that performance improves 
if arrival rate slows down, but the relative performances of the algorithms remain unchanged. 
Figure [6] shows this for the peak performances of the algorithms, for Pareto bid distributions. 

The plots for Gaussian distributions and threshold algorithms are similar in both cases. 



| RANDOM 




Impression rate (average time between consecutive impressions) 



Figure 6: Peak performances of algorithms for different impression arrival rates for Pareto bid 
distributions 



8 Conclusion Sz Future Work 

We initiate a formal study of bandwidth and resource constraints faced by ad exchanges and ad 
networks as they move into the arena of real time bid solicitation. Our conceptual framework 
of learning bid distribution followed by online decision making should be useful for further study 
of real time bidding systems. Our online algorithms are fast and can be implemented to run in 
real time within this conceptual framework. The technical framework of our algorithm, with a 
short learning phase followed by a long online phase, can generally be applied to any problem that 
involves maximizing linear objectives, or objectives that can be approximated by a concave or linear 
objective, with linear constraints. 
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A The Adversarial Model 

We show that no online algorithm, deterministic or randomized, can give any approximation to the 
offline optimum, even when the instance has only one ad network, and the bids are or 1. 

Theorem 6 For any given < a < 1, there exists no randomized algorithm which is a- approximate, 
if the length of the sequence is arbitrarily large. 

Proof: Maximizing the sales objective in the presence of only one ad network subsumes the online 
knapsack problem (the value of an item is the probability that the bid exceeds minimum price) 
with unit size of items, for which no algorithm can give better than l/r2(logm) approximation [18] . 
If m is allowed to be large, then no algorithm giving approximation independent of m can exist. □ 

One key feature of the adversarially constructed sequences in the theorem above is that OPT can 
be arbitrarily small compared to the length of the sequence. This is unlikely to happen; moreover, 
if it were to happen, then there would not be much benefit out of constructing the real time 
bidding system. Thus we make the assumption that OPT is reasonably large. But even with that 
assumption, we don't fare particularly well. Following is a simple observation about sales. 
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Lemma A.l For a given impression j, suppose ad network i, with survival probability pij is called 
out with probability Xij. Then the probability of selling the impression is 1 — njt=i(l — PijXij)- If 
'Y^l=\Vij x ij = c > then the probability of selling the impression is at most c and at least 1 — \. 

Proof: The probability of selling the impression is 1 — (1 — PijXij). It is upper bounded by 
^2iPijXij = c. The probability is lower bounded by 1 — rL"=i(l ~~ c/n) n where n is the number of 
ad- networks, and this is lower bounded by 1 — ~. □ 

Theorem 7 Suppose it is promised that OPT > 5m, where 5 > is known to the algorithm. Then, 
no algorithm can be better than I/O, (log (1/5)) -approximate for the sales or total value objective. 
Moreover, a simple randomized algorithm is l/0(log(re/ 5)) -approximate for the sales objective. 

Proof: The hardness instances for online knapsack problems also imply that if the value of all 
items are guaranteed to lie between L and U, L < U, then no algorithm can give better than 
l/Q(\og(U/L)) approximation [18] . if m is allowed to be arbitrarily large. The promise OPT > 5m 
allows us to construct any instance where values of items range from 5 to 1 , thus yielding the lower 
bound. For the upper bound on sales, we use the following simple algorithm: Let us assume, wlog, 
that 5 is a negative power of 2. Choose a random cut-off t from the set H = {5/2, 5, 25, 45 . . . 1}. 
Now send each impression to any 2/t ad networks who has survival probability between t and 2t and 
has bandwidth remaining. If there are less than 2/t such qualifying ad networks for an impression, 
then send it to all qualifying ad networks. This algorithm gives a l/0(log(n/<5)) approximation; 
the proof is standard. □ 
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